Section 1. Matplotlib Charts

In [264]:
# Import the packages
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
import pandas as pd
import datetime as dt
In [265]:
# Load data on shootings
shoot = pd.read_csv("C:/Users/alber/OneDrive/Desktop/MUSA620/HW2/shootings.csv")
shoot.head(n=20)  
Out[265]:
the_geom lng objectid year dc_key code date_ race sex age ... point_x point_y dist time inside outside fatal lat offender_injured count
0 0101000020E6100000F573969B60C952C0918E9BAFCCFE... -75.146521 140176 NaN 2.02E+11 111 9/10/2019 W F 23 ... -75.146521 39.990622 26.0 NaN 0.0 1.0 1.0 39.990622 N 1
1 0101000020E6100000F573969B60C952C0918E9BAFCCFE... -75.146521 140177 NaN 2.02E+11 411 9/10/2019 B M 54 ... -75.146521 39.990622 26.0 NaN 0.0 1.0 0.0 39.990622 N 1
2 0101000020E610000045BFCF1BDBCD52C04F0C03754EF7... -75.216498 140178 NaN 2.02E+11 411 9/11/2019 B M 23 ... -75.216498 39.932082 12.0 NaN 0.0 1.0 0.0 39.932082 N 1
3 0101000020E61000002D70A22BCBCC52C0A7D7DAB9B8FB... -75.199901 140191 NaN 2.02E+11 411 9/11/2019 B M 34 ... -75.199901 39.966575 16.0 NaN 0.0 1.0 0.0 39.966575 N 1
4 0101000020E61000002D70A22BCBCC52C0A7D7DAB9B8FB... -75.199901 140192 NaN 2.02E+11 411 9/11/2019 B M 37 ... -75.199901 39.966575 16.0 NaN 0.0 1.0 0.0 39.966575 N 1
5 0101000020E610000067544B009FCD52C0E2751E9146F8... -75.212830 140481 NaN 2.02E+11 411 9/11/2019 B M 50 ... -75.212830 39.939654 12.0 NaN 0.0 1.0 0.0 39.939654 N 1
6 0101000020E610000090596A7B83CA52C0F45DBEB34506... -75.164275 140482 NaN 2.02E+11 300 9/11/2019 W M 54 ... -75.164275 40.049002 14.0 NaN 0.0 1.0 0.0 40.049002 N 1
7 0101000020E6100000C7AA392753C852C0FB01301A09FF... -75.130075 133968 NaN 2.02E+11 411 6/26/2017 W F 25 ... -75.130075 39.992465 24.0 NaN 0.0 1.0 0.0 39.992465 N 1
8 0101000020E6100000BE2DFDF018CA52C0FDB60EABCBF5... -75.157772 133969 NaN 2.02E+11 111 6/26/2017 B M 18 ... -75.157772 39.920278 3.0 NaN 0.0 1.0 1.0 39.920278 N 1
9 0101000020E61000003E4506A0AEC952C018840BDC2E01... -75.151283 133970 NaN 2.02E+11 411 6/28/2017 B M 17 ... -75.151283 40.009243 39.0 NaN 0.0 1.0 0.0 40.009243 N 1
10 0101000020E610000045A7007BF9CB52C0682B69A4A9FD... -75.187102 133971 NaN 2.02E+11 411 6/27/2017 W M 29 ... -75.187102 39.981740 22.0 NaN 0.0 1.0 0.0 39.981740 N 1
11 0101000020E6100000DF3F8AF8D6CB52C02697F71F36F7... -75.184996 133972 NaN 2.02E+11 411 6/27/2017 B M 28 ... -75.184996 39.931339 17.0 NaN 0.0 1.0 0.0 39.931339 N 1
12 0101000020E6100000437E89BA47C952C07D4B10E03AFD... -75.145003 133973 NaN 2.02E+11 111 6/28/2017 W M 30 ... -75.145003 39.978359 26.0 NaN 0.0 1.0 1.0 39.978359 N 1
13 0101000020E6100000D263C6A1DFCE52C0F09272D6ABF7... -75.232399 133974 NaN 2.02E+11 411 6/28/2017 B F 16 ... -75.232399 39.934932 12.0 NaN 0.0 1.0 0.0 39.934932 N 1
14 0101000020E610000073EA3503BDC052C04469B10C5806... -75.011536 133975 NaN 2.02E+11 411 6/28/2017 B M 25 ... -75.011536 40.049562 8.0 NaN 0.0 1.0 0.0 40.049562 N 1
15 0101000020E61000007E2C8E145BCB52C0C46956F0BAF8... -75.177434 133976 NaN 2.02E+11 3006 7/15/2017 B M 19 ... -75.177434 39.943205 17.0 NaN 0.0 1.0 0.0 39.943205 N 1
16 0101000020E61000007E2C8E145BCB52C0C46956F0BAF8... -75.177434 133977 NaN 2.02E+11 3006 7/15/2017 B M 24 ... -75.177434 39.943205 17.0 NaN 0.0 1.0 0.0 39.943205 N 1
17 0101000020E61000007E2C8E145BCB52C0C46956F0BAF8... -75.177434 133978 NaN 2.02E+11 3006 7/15/2017 B M 26 ... -75.177434 39.943205 17.0 NaN 0.0 1.0 0.0 39.943205 N 1
18 0101000020E6100000249F1520A3C952C02C7FDF3BC1F7... -75.150581 133979 NaN 2.02E+11 111 7/16/2017 B M 42 ... -75.150581 39.935585 3.0 NaN 0.0 1.0 1.0 39.935585 N 1
19 0101000020E6100000D787B32A5BCB52C047D73B9201FF... -75.177439 133980 NaN 2.02E+11 411 7/15/2017 B M 35 ... -75.177439 39.992235 22.0 NaN 1.0 0.0 0.0 39.992235 N 1

20 rows × 26 columns

In [266]:
# Remove unwanted columns
unwanted = ['the_geom', 'lng','year','dc_key','code','the_geom_webmercator','offender_deceased','dist','time','inside','outside','lat','offender_injured']
shooting = shoot.drop(unwanted, axis=1)
In [267]:
shooting
Out[267]:
objectid date_ race sex age wound officer_involved location latino point_x point_y fatal count
0 140176 9/10/2019 W F 23 head/multi N 2500 BLOCK N 9th St 1.0 -75.146521 39.990622 1.0 1
1 140177 9/10/2019 B M 54 shoulder N 2500 BLOCK N 9th St 0.0 -75.146521 39.990622 0.0 1
2 140178 9/11/2019 B M 23 back N 5500 BLOCK Linbergh Blvd 0.0 -75.216498 39.932082 0.0 1
3 140191 9/11/2019 B M 34 hip N 3800 BLOCK Aspen St 0.0 -75.199901 39.966575 0.0 1
4 140192 9/11/2019 B M 37 legs N 3800 BLOCK Aspen St 0.0 -75.199901 39.966575 0.0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
6206 134868 2/7/2016 B M 22 stom N 200 BLOCK W Nedro Ave 0.0 -75.124602 40.040384 0.0 1
6207 134869 2/11/2016 B M 21 head N 5900 BLOCK Malta St 0.0 -75.100304 40.043923 0.0 1
6208 134870 2/11/2016 B M 19 arm N 1800 BLOCK WSusquehana Ave 0.0 -75.163623 39.987211 0.0 1
6209 134871 2/9/2016 B M 17 shoulder N 7400 BLOCK Frankford Ave 0.0 -75.037997 40.037857 0.0 1
6210 134872 2/13/2016 B M 24 multi/head N 0 BLOCK N Paxon St 0.0 -75.223822 39.959912 1.0 1

6211 rows × 13 columns

In [268]:
# Convert date data from strings to datetime objects
shooting['date_'] = pd.to_datetime(shooting['date_'])
shooting['Month'] = shooting['date_'].dt.month
shooting['Year'] = shooting['date_'].dt.year
shooting['Day'] = shooting['date_'].dt.day
In [269]:
shooting.head(n=20)
Out[269]:
objectid date_ race sex age wound officer_involved location latino point_x point_y fatal count Month Year Day
0 140176 2019-09-10 W F 23 head/multi N 2500 BLOCK N 9th St 1.0 -75.146521 39.990622 1.0 1 9 2019 10
1 140177 2019-09-10 B M 54 shoulder N 2500 BLOCK N 9th St 0.0 -75.146521 39.990622 0.0 1 9 2019 10
2 140178 2019-09-11 B M 23 back N 5500 BLOCK Linbergh Blvd 0.0 -75.216498 39.932082 0.0 1 9 2019 11
3 140191 2019-09-11 B M 34 hip N 3800 BLOCK Aspen St 0.0 -75.199901 39.966575 0.0 1 9 2019 11
4 140192 2019-09-11 B M 37 legs N 3800 BLOCK Aspen St 0.0 -75.199901 39.966575 0.0 1 9 2019 11
5 140481 2019-09-11 B M 50 leg N 4900 BLOCK Saybrook Ave 0.0 -75.212830 39.939654 0.0 1 9 2019 11
6 140482 2019-09-11 W M 54 hand N 900 BLOCK E Rittenhouse St 0.0 -75.164275 40.049002 0.0 1 9 2019 11
7 133968 2017-06-26 W F 25 leg N 2800 BLOCK N Lee St 1.0 -75.130075 39.992465 0.0 1 6 2017 26
8 133969 2017-06-26 B M 18 chest/back N 2300 BLOCK S Marshall St 0.0 -75.157772 39.920278 1.0 1 6 2017 26
9 133970 2017-06-28 B M 17 shoulder N 1400 BLOCK W Erie Ave 0.0 -75.151283 40.009243 0.0 1 6 2017 28
10 133971 2017-06-27 W M 29 thigh N 3200 BLOCK W Turner St 0.0 -75.187102 39.981740 0.0 1 6 2017 27
11 133972 2017-06-27 B M 28 thigh N 2400 BLOCK Morris St 0.0 -75.184996 39.931339 0.0 1 6 2017 27
12 133973 2017-06-28 W M 30 multi N 1700 BLOCK N 6th St 0.0 -75.145003 39.978359 1.0 1 6 2017 28
13 133974 2017-06-28 B F 16 multi N 5900 BLOCK Springfield Ave 0.0 -75.232399 39.934932 0.0 1 6 2017 28
14 133975 2017-06-28 B M 25 leg N 8800 BLOCK Frankford Ave 0.0 -75.011536 40.049562 0.0 1 6 2017 28
15 133976 2017-07-15 B M 19 leg N 2100 BLOCK Fitzwater St 0.0 -75.177434 39.943205 0.0 1 7 2017 15
16 133977 2017-07-15 B M 24 shouldr N 2100 BLOCK Fitzwater St 0.0 -75.177434 39.943205 0.0 1 7 2017 15
17 133978 2017-07-15 B M 26 leg N 2100 BLOCK Fitzwater St 0.0 -75.177434 39.943205 0.0 1 7 2017 15
18 133979 2017-07-16 B M 42 chest N 1000 BLOCK S4TH St 0.0 -75.150581 39.935585 1.0 1 7 2017 16
19 133980 2017-07-15 B M 35 stomach N 2700 BLOCK York St 0.0 -75.177439 39.992235 0.0 1 7 2017 15
In [270]:
# Check the elements in column 'Year'
shooting['Year'].unique()
Out[270]:
array([2019, 2017, 2016, 2015, 2018], dtype=int64)
In [271]:
# Select the races I want (Get rid of the strange data)
valid_race = ['W','B','A','I']
In [272]:
selection = shooting['race'].isin(valid_race)
shooting_used = shooting.loc[selection]
In [273]:
# Check the elements in column "race"
shooting_used['race'].unique()
Out[273]:
array(['W', 'B', 'A', 'I'], dtype=object)
In [274]:
# Calculate the total amount of shootings for each race
shooting_Race = shooting_used.groupby(['race'])['count'].count()
In [275]:
# Reset the index so that the index values are listed as columns in the data frame again
shooting_Race = shooting_Race.reset_index()
In [276]:
shooting_Race
Out[276]:
race count
0 A 41
1 B 5095
2 I 2
3 W 996
In [277]:
# Rename the columns
shooting_Race.columns = ['Race','Shooting_Amount']
In [278]:
shooting_Race
Out[278]:
Race Shooting_Amount
0 A 41
1 B 5095
2 I 2
3 W 996

First Matplotlib Chart: Pie Chart

In [279]:
# Initialize the figure
plt.figure(figsize=(16,8))

# Plot a pie chart
ax1 = plt.subplot(121, aspect='equal')
explode = (0, 0.1, 0, 0)  # only "explode" the 2nd slice (i.e. 'Hogs')
shooting_Race.plot(kind='pie', explode=explode, y = 'Shooting_Amount', ax=ax1, autopct='%1.1f%%', 
 startangle=90, shadow=False, labels=shooting_Race['Race'], legend = True, fontsize=14)
Out[279]:
<matplotlib.axes._subplots.AxesSubplot at 0x19410ebe3c8>

Discussion

  • Process and reasons of making the graph:

At first, I noticed that the shooting victims are racial diversely. So, I decided to make a pie chart to get more senses about the extents that different races involved and got hurt in the shootings. Because the Matplotlib is a basic package for data visualization, I have a lot of freedom to customize the graph by myself. So, after setting up the basic axis information of the graph, I inputed the data of shootings and divided them by races. After setting the size of the fonts, the percentages of victimes in each race are shown on the graph properly. Finally, I exploded the chart by divided the largest part from the other parts of it.

  • Findings from the chart:

By observing the chart, I found that : 1) More than 80% of the shooting victims between 2015 and 2019 were Black. 2) There were only a small number of Asian and Indian victims.

In [280]:
# Calculate the total amount of shootings for each race and sex
shooting_RaceSex = shooting_used.groupby(['Year','race','sex'])['count'].count()
In [281]:
# Reset the index so that the index values are listed as columns in the data frame again
shooting_RaceSex = shooting_RaceSex.reset_index()
In [282]:
shooting_RaceSex
Out[282]:
Year race sex count
0 2015 A M 11
1 2015 B F 71
2 2015 B M 966
3 2015 W F 31
4 2015 W M 156
5 2016 A M 8
6 2016 B F 62
7 2016 B M 988
8 2016 W F 30
9 2016 W M 190
10 2017 A F 2
11 2017 A M 5
12 2017 B F 64
13 2017 B M 927
14 2017 W F 24
15 2017 W M 202
16 2018 A F 2
17 2018 A M 8
18 2018 B F 94
19 2018 B M 1094
20 2018 W F 26
21 2018 W M 177
22 2019 A F 1
23 2019 A M 4
24 2019 B F 70
25 2019 B M 759
26 2019 I F 1
27 2019 I M 1
28 2019 W F 20
29 2019 W M 140
In [283]:
# Create a new column by concatenating the columns of races and sexes
shooting_RaceSex['Race_and_Sex'] = shooting_RaceSex['race'] +'-'+ shooting_RaceSex['sex']
print(shooting_RaceSex)
    Year race sex  count Race_and_Sex
0   2015    A   M     11          A-M
1   2015    B   F     71          B-F
2   2015    B   M    966          B-M
3   2015    W   F     31          W-F
4   2015    W   M    156          W-M
5   2016    A   M      8          A-M
6   2016    B   F     62          B-F
7   2016    B   M    988          B-M
8   2016    W   F     30          W-F
9   2016    W   M    190          W-M
10  2017    A   F      2          A-F
11  2017    A   M      5          A-M
12  2017    B   F     64          B-F
13  2017    B   M    927          B-M
14  2017    W   F     24          W-F
15  2017    W   M    202          W-M
16  2018    A   F      2          A-F
17  2018    A   M      8          A-M
18  2018    B   F     94          B-F
19  2018    B   M   1094          B-M
20  2018    W   F     26          W-F
21  2018    W   M    177          W-M
22  2019    A   F      1          A-F
23  2019    A   M      4          A-M
24  2019    B   F     70          B-F
25  2019    B   M    759          B-M
26  2019    I   F      1          I-F
27  2019    I   M      1          I-M
28  2019    W   F     20          W-F
29  2019    W   M    140          W-M
In [284]:
# Rename the columns
shooting_RaceSex.columns = ['Year','Race','Sex','Amount','Race_and_Sex']
In [285]:
shooting_RaceSex
Out[285]:
Year Race Sex Amount Race_and_Sex
0 2015 A M 11 A-M
1 2015 B F 71 B-F
2 2015 B M 966 B-M
3 2015 W F 31 W-F
4 2015 W M 156 W-M
5 2016 A M 8 A-M
6 2016 B F 62 B-F
7 2016 B M 988 B-M
8 2016 W F 30 W-F
9 2016 W M 190 W-M
10 2017 A F 2 A-F
11 2017 A M 5 A-M
12 2017 B F 64 B-F
13 2017 B M 927 B-M
14 2017 W F 24 W-F
15 2017 W M 202 W-M
16 2018 A F 2 A-F
17 2018 A M 8 A-M
18 2018 B F 94 B-F
19 2018 B M 1094 B-M
20 2018 W F 26 W-F
21 2018 W M 177 W-M
22 2019 A F 1 A-F
23 2019 A M 4 A-M
24 2019 B F 70 B-F
25 2019 B M 759 B-M
26 2019 I F 1 I-F
27 2019 I M 1 I-M
28 2019 W F 20 W-F
29 2019 W M 140 W-M
In [286]:
# Get the total number of shootings for each year from 2015 to 2019
shooting_Number = shooting_RaceSex.groupby(['Year'])['Amount'].sum()
In [287]:
# Reset the index so that the index values are listed as columns in the data frame again
shooting_Number = shooting_Number.reset_index()
shooting_Number.columns = ['Year','Total']
In [288]:
shooting_Number
Out[288]:
Year Total
0 2015 1235
1 2016 1278
2 2017 1224
3 2018 1401
4 2019 996
In [289]:
# Merge the two dataframes
Merge = shooting_RaceSex.merge(shooting_Number)
In [290]:
# Calculate the percentage of shootings for each race and sex to the total number for each year
Merge['Percent']=Merge['Amount']/Merge['Total']*100
In [291]:
Merge
Out[291]:
Year Race Sex Amount Race_and_Sex Total Percent
0 2015 A M 11 A-M 1235 0.890688
1 2015 B F 71 B-F 1235 5.748988
2 2015 B M 966 B-M 1235 78.218623
3 2015 W F 31 W-F 1235 2.510121
4 2015 W M 156 W-M 1235 12.631579
5 2016 A M 8 A-M 1278 0.625978
6 2016 B F 62 B-F 1278 4.851330
7 2016 B M 988 B-M 1278 77.308294
8 2016 W F 30 W-F 1278 2.347418
9 2016 W M 190 W-M 1278 14.866980
10 2017 A F 2 A-F 1224 0.163399
11 2017 A M 5 A-M 1224 0.408497
12 2017 B F 64 B-F 1224 5.228758
13 2017 B M 927 B-M 1224 75.735294
14 2017 W F 24 W-F 1224 1.960784
15 2017 W M 202 W-M 1224 16.503268
16 2018 A F 2 A-F 1401 0.142755
17 2018 A M 8 A-M 1401 0.571021
18 2018 B F 94 B-F 1401 6.709493
19 2018 B M 1094 B-M 1401 78.087081
20 2018 W F 26 W-F 1401 1.855817
21 2018 W M 177 W-M 1401 12.633833
22 2019 A F 1 A-F 996 0.100402
23 2019 A M 4 A-M 996 0.401606
24 2019 B F 70 B-F 996 7.028112
25 2019 B M 759 B-M 996 76.204819
26 2019 I F 1 I-F 996 0.100402
27 2019 I M 1 I-M 996 0.100402
28 2019 W F 20 W-F 996 2.008032
29 2019 W M 140 W-M 996 14.056225

Second Matplotlib Chart: Line Chart

In [292]:
# Initialize the figure and axes
fig, ax = plt.subplots(figsize=(15, 10))

# Color for each race-sex
color_map = {"A-M": "#F2C335","A-F": "#F2D479", "B-M": "#000000","B-F": "#6C6F73", "W-M": "#D53711" ,"W-F": "#DB6E53" ,"I-M": "#034AA6","I-F": "#79D0F2"}

# Plot each race-sex
for Race_and_Sex, group in Merge.groupby("Race_and_Sex"):
    print(f"Plotting {Race_and_Sex}...")

    # Plot year vs amount of shootings for this group
    ax.plot(
        group["Year"],
        group["Amount"],
        marker="o",
        label=Race_and_Sex,
        color=color_map[Race_and_Sex],
        alpha=2,
    )

# Format the axes
ax.legend(loc="best")
ax.set_xlabel("Shooting Date")
ax.set_ylabel("Amount of Shooting per Year")

ax.set_ylim(-150, 1300)
ax.grid(True)
Plotting A-F...
Plotting A-M...
Plotting B-F...
Plotting B-M...
Plotting I-F...
Plotting I-M...
Plotting W-F...
Plotting W-M...

Discussion

  • Process and reasons of making the graph:

After knowing the general composition of the shooting victims among different races, I decided to dig in more about the change of the amount of shootings among not only different race but also different sex groups between 2015 to 2019. So, I made a line chart. As mentioned before, because the Matplotlib is a basic package for data visualization, I have a lot of freedom to customize the graph by myself. So, after setting up the basic axis information of the graph, I set the colors for each group: Yellow for Asians, Black for Black people, Red for White People, and Blue for Indians. In addition, the colors of males were set deeper than females. By setting the shooting date (year) in x-axis and the number of shooting victims per year in y-axis, I got the line chart.

  • Findings from the chart:

By observing the chart, I found that : 1) The number of black male victims got to the highest point in 2018 after a short-term decrease from 2016 to 2017. 2) The number of white male victims slightly increased from 2015 to 2017 and declined from 2017 to 2018. 3) There were not many Asian and Indian victims, so the numbers of them seems staying stable all the time. 4) The numbers of male victims are always more than female victims in the same races.

Section 2. Seaborn Charts

In [293]:
# Import the package
import seaborn as sns

First Seaborn Chart: Bar Chart

In [294]:
# Initialize the figure and axes
fig, ax = plt.subplots(figsize=(10, 8))

# Plot the chart
ax = sns.barplot(x="Race_and_Sex", y="Percent", hue="Year", palette='magma', data=Merge)

Discussion

  • Process and reasons of making the graph:

After knowing the general trends of changing of the numbers of the shooting victims among different race-sex groups, I decided to dig in more about the change of the percentage of shootings among these groups between 2015 to 2019 to see if some specific groups of people became more vulnerable in recent years. So, I made a bar chart by using Seaborn. In fact, Seaborn is more advanced than Matplotlib with simplified codes for data visulaization. So, by setting the race-sex groups in x-axis and the percentages of shooting victims in y-axis, I got the bar chart. To make the chart looking prettier, I set the color palette and made the background in dark color.

  • Findings from the chart:

By observing the chart, I found that : 1) The percentages of both black male and female victims increased from 2017 to 2018 after decreases from 2015 to 2017. In addition, the percentage of black female victims got the highest point in recent years even though there were only 9-month data. 2) The percentage of white male victims increased from 2015 to 2017 and declined from 2017 to 2018, and that of white females continuously decreased in recent years. 3) There were not many Asian and Indian victims. 4) The percentage of black male victims is always more than 75% of all victims.

In [295]:
# Get the number of shootings for each month
shooting_1 = shooting_used.groupby(['Year','Month'])['count'].count()
In [296]:
# Reset the index so that the index values are listed as columns in the data frame again
shooting_1 = shooting_1.reset_index()
In [297]:
# Rename the columns
shooting_1.columns = ['Year','Month','Shootings']
In [298]:
shooting_1
Out[298]:
Year Month Shootings
0 2015 1 82
1 2015 2 49
2 2015 3 63
3 2015 4 69
4 2015 5 121
5 2015 6 110
6 2015 7 144
7 2015 8 160
8 2015 9 121
9 2015 10 94
10 2015 11 100
11 2015 12 122
12 2016 1 94
13 2016 2 78
14 2016 3 100
15 2016 4 99
16 2016 5 91
17 2016 6 105
18 2016 7 154
19 2016 8 147
20 2016 9 107
21 2016 10 100
22 2016 11 115
23 2016 12 88
24 2017 1 96
25 2017 2 77
26 2017 3 84
27 2017 4 113
28 2017 5 111
29 2017 6 105
30 2017 7 117
31 2017 8 93
32 2017 9 124
33 2017 10 112
34 2017 11 94
35 2017 12 98
36 2018 1 82
37 2018 2 102
38 2018 3 80
39 2018 4 105
40 2018 5 144
41 2018 6 123
42 2018 7 123
43 2018 8 141
44 2018 9 123
45 2018 10 141
46 2018 11 115
47 2018 12 122
48 2019 1 88
49 2019 2 80
50 2019 3 130
51 2019 4 114
52 2019 5 115
53 2019 6 147
54 2019 7 123
55 2019 8 152
56 2019 9 47

Second Seaborn Chart: Heatmap

In [299]:
sns.set()
# Convert the wide dataframe to long-form
shooting_2 = shooting_1.pivot("Month", "Year", "Shootings")

# Draw a heatmap with the numeric values (number of shootings) in each cell
f, ax = plt.subplots(figsize=(10, 6))
sns.heatmap(shooting_2, annot=True, fmt="g", linewidths=.5, cmap='viridis', ax=ax)
Out[299]:
<matplotlib.axes._subplots.AxesSubplot at 0x19410f791d0>

Discussion

  • Process and reasons of making the graph:

Since Seaborn is more advanced than Matplotlib, it can be used to visulaize data in some more effective charts rather the basic plots. Because I want to check if there were more shootings happened in some specific months. To make the chart looking prettier, I set the color palette as "viridis" to show the cells with larger number lighter and those with smaller number darker.

  • Findings from the chart:

By observing the chart, I found that : 1) There were always a large amount of shootings in summer (June to August) and few in winter, especially in January and February.

Section 3. Interactive Altair Charts

In [300]:
# Import the pachage
import altair as alt
alt.renderers.enable('notebook')
Out[300]:
RendererRegistry.enable('notebook')
In [301]:
shooting_used
Out[301]:
objectid date_ race sex age wound officer_involved location latino point_x point_y fatal count Month Year Day
0 140176 2019-09-10 W F 23 head/multi N 2500 BLOCK N 9th St 1.0 -75.146521 39.990622 1.0 1 9 2019 10
1 140177 2019-09-10 B M 54 shoulder N 2500 BLOCK N 9th St 0.0 -75.146521 39.990622 0.0 1 9 2019 10
2 140178 2019-09-11 B M 23 back N 5500 BLOCK Linbergh Blvd 0.0 -75.216498 39.932082 0.0 1 9 2019 11
3 140191 2019-09-11 B M 34 hip N 3800 BLOCK Aspen St 0.0 -75.199901 39.966575 0.0 1 9 2019 11
4 140192 2019-09-11 B M 37 legs N 3800 BLOCK Aspen St 0.0 -75.199901 39.966575 0.0 1 9 2019 11
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6206 134868 2016-02-07 B M 22 stom N 200 BLOCK W Nedro Ave 0.0 -75.124602 40.040384 0.0 1 2 2016 7
6207 134869 2016-02-11 B M 21 head N 5900 BLOCK Malta St 0.0 -75.100304 40.043923 0.0 1 2 2016 11
6208 134870 2016-02-11 B M 19 arm N 1800 BLOCK WSusquehana Ave 0.0 -75.163623 39.987211 0.0 1 2 2016 11
6209 134871 2016-02-09 B M 17 shoulder N 7400 BLOCK Frankford Ave 0.0 -75.037997 40.037857 0.0 1 2 2016 9
6210 134872 2016-02-13 B M 24 multi/head N 0 BLOCK N Paxon St 0.0 -75.223822 39.959912 1.0 1 2 2016 13

6134 rows × 16 columns

First Altair Chart: Bar Chart (Interactive)

In [302]:
# Remove the limitation of row numbers
alt.data_transformers.disable_max_rows()

# Plot the graph
alt.Chart(shooting_used).mark_bar().encode(
    y='race:O',
    x='mean(age):Q',color='race'
).transform_bin(
    'Race', field='race'
).interactive()
Out[302]:

Discussion

  • Process and reasons of making the graph:

In the graphs I made before, I already got some senses about the differences of the shootings happened among different race and sex groups. So, at this moment, I wanted to investigate the mean age of the victims among different racial groups. During making the chart, I set the mean age as the X axis and race as the Y axis. Finally, I made the chart interactively.

  • Findings from the chart:

By observing the chart, I found that : 1) Mean age of victims among Asians is the highest (higher than 35 years old). 2) Mean age of Black victims is the lowest, whihc is lower than 30 years old. Most victims are very young!

Second Altair Chart: Bar Chart

In [303]:
# Plot the graph
alt.Chart(shooting_used).mark_bar().encode(
    x='race:O',
    y='count()',color='sex'
).transform_bin(
    'binned_rating', field='race'
)
Out[303]:

Discussion

  • Process and reasons of making the graph:

This chart conveyed the similar information as the line chart and bar chart I made in the former 2 sections. However, in this graph, I transformed the binning and showed different sexes together in a same bar.

  • Findings from the chart:

By observing the chart, I found that : 1) The majority of the victims are Black. 2) Males are more easily to get involved in shootings than females.

Third Altair Chart: 2-Chart Dashboard (Scatterplot & Bar Chart)

In [304]:
# Include the brush selection in plot
brush = alt.selection(type='interval')

# The top scatterplot
upper = alt.Chart().mark_point().encode(
    alt.X('date_:T',scale=alt.Scale(domain=brush)),
    y='age:Q',
    color=alt.condition(brush, 'race:N', alt.value('lightgray'))
).properties(
    selection=brush,
    width=800
)

# The bottom bar plot
lower = alt.Chart().mark_bar().encode(
    y='race:N',
    color='race:N',
    x='count(race):Q'
).transform_filter(
    brush.ref()
).properties(
width=800
).interactive()

chart = alt.vconcat(upper, lower, data=shooting_used) # vertical stacking
chart
Out[304]:

Discussion

  • Process and reasons of making the graph:

In this 2-chart dashboard, I plotted the shooting records as scatterplot in every day from 2015 to 2019 to check the age and race distributions of the victims. The connection of these two charts and the application of the brush selection gave me more sense of the number of victims in each race increasing in a period of time.

  • Findings from the chart:

By observing the chart, I found that : 1) The majority of the victims are Black. 2) The black victims seem particularly young.

Fourth Altair Chart: 2-Chart Dashboard (Area Plot & Bar Chart)

In [305]:
# Include the brush selection in plot
brush = alt.selection(type='interval')

# The top area plot
lines = alt.Chart().mark_area().encode(
    alt.Y('sum(count):Q',scale=alt.Scale(domain=brush)),
    alt.X('Year:N',scale=alt.Scale(domain=brush)),
    color=alt.condition(brush, 'race:N', alt.value('lightgray'))
).properties(
    selection=brush,
    width=800
)

# The bottom bar plot
bars = alt.Chart().mark_bar().encode(
    y='race:N',
    color='race:N',
    x='count(race):Q'
).transform_filter(
    brush.ref() # the filter transform uses the selection
                # to filter the input data to this chart
).properties(
width=800
).interactive()

chart = alt.vconcat(lines, bars, data=shooting_used) # vertical stacking
chart
Out[305]:

Discussion

  • Process and reasons of making the graph:

In this 2-chart dashboard, I plotted the total shooting records per year for each race as area plots to see the number of the victims in each race. The connection of these two charts and the application of the brush selection gave me more sense of the number of victims in each race increasing in a period of time (years).

  • Findings from the chart:

By observing the chart, I found that : 1) The majority of the victims are Black. 2) From 2015 to 2017, the number of White victims increased more than any other races, and from 2017 to 2018, the number of Black victims increased the most.

Fifth Altair Chart: Facetting Line Charts

In [306]:
# Plot the graph with the number of shootings based on the races and sexes of victims
alt.Chart(shooting_used).mark_line().encode(
    x="Year:N", 
    y="sum(count):Q", 
    color="race:N"
).properties(width=200, 
             height=200
).facet(column="race",row="sex").interactive()
Out[306]:

Discussion

  • Process and reasons of making the graph:

In the last group of charts by using Altair, I plotted the facetting plots with 8 charts in it. In fact, each of the chart belongs to one race-sex group. After setting the Y axis for total number of shootings, and X axis as years (2015 to 2019), I can easily do comparison between these groups (especially between the groups of the same sex). In addition, by setting the graphs to be interactive, I can change the scale of the Y axis for all 8 charts at the same time, which is good for me to get to know the numbers more exactly.

  • Findings from the chart:

By observing the chart, I found that : 1) The number of the black-male victims are far more than other race-male groups. 2) The numbers of victims as Asian and Indians are really small.

In [ ]: